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Abstract 

A  sequence  of  images  acquired  by  a  moving  sensor  contains  information  about  the  three-dimensional 
motion  of  the  sensor  and  the  shape  of  the  imaged  scene.  Interesting  research  during  the  past  few 
years  has  attempted  to  characterize  the  errors  that  arise  in  computing  3D  motion  (egomotion 
estimation)  as  well  as  the  errors  that  result  in  the  estimation  of  the  scene’s  structure  (structure 
from  motion).  Previous  research  is  characterized  by  the  use  of  optic  flow  or  correspondence  of 
features  in  the  analysis  as  well  as  by  the  employment  of  particular  algorithms  and  models  of  the 
scene  in  recovering  expressions  for  the  resulting  errors.  This  paper  presents  a  geometric  framework 
that  characterizes  the  relationship  between  3D  motion  and  shape  when  they  are  both  corrupted  by 
errors.  We  examine  how  the  three-dimensional  space  recovered  by  a  moving  monocular  observer, 
whose  3D  motion  is  estimated  with  some  error,  is  distorted.  We  characterize  the  space  of  distortions 
by  its  level  sets,  that  is,  we  characterize  the  systematic  distortion  via  a  family  of  iso-distortion 
surfaces,  each  of  which  describes  the  locus  over  which  the  depths  of  points  in  the  scene  in  view  are 
distorted  by  the  same  multiplicative  factor.  The  framework  introduced  in  this  way  has  a  number 
of  applications:  Since  the  visible  surfaces  have  positive  depth  (visibility  constraint),  by  analyzing 
the  geometry  of  the  regions  where  the  distortion  factor  is  negative,  that  is,  where  the  visibility 
constraint  is  violated,  we  make  explicit  situations  which  are  likely  to  give  rise  to  ambiguities  in 
motion  estimation,  independent  of  the  algorithm  used.  We  provide  a  uniqueness  analysis  for  3D 
motion  analysis  from  normal  flow.  We  study  the  constraints  on  egomotion,  object  motion  and 
depth  for  an  independently  moving  object  to  be  detectable  by  a  moving  observer,  and  we  offer 
a  quantitative  account  of  the  precision  needed  in  an  inertial  sensor  for  accurate  estimation  of  3D 
motion. 
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1  Introduction 


Visual  motion  perception  is  one  of  the  most  important  visual  faculties  of  biological  systems. 
It  is  concerned  with  the  inference  of  our  movement  through  the  world  and  the  detection  of 
other  moving  bodies.  It  also  allows  us  to  infer  the  structure  of  the  world  around  us.  The 
apparent  ease  with  which  we  perform  these  tasks  belies  the  underlying  nontrivial  compu¬ 
tational  issues,  as  evidenced  by  the  lack  of  robust,  real-time  and  accurate  algorithms  for 
estimating  3D  motion  and  shape  that  exist  today. 

The  main  difficulty  faced  by  such  algorithms  is  the  ill-conditioned  nature  of  the  problem. 
While  many  solutions  to  this  problem  have  been  proposed,  either  using  feature  correspon¬ 
dences  or  flow  fields,  they  do  not  work  well  for  real,  complex  scenes,  and  most  of  them 
degrade  ungracefully  as  the  quality  of  the  data  deteriorates.  Many  error  analyses  have  been 
carried  out  in  the  past  [1,  5,  7,  19,  23,  24];  a  recent  illuminating  and  critical  survey  is  pre¬ 
sented  in  [6].  They  attempt  to  model  either  the  errors  in  the  motion  estimates  or  those  in 
the  depth  estimates,  but  due  to  the  large  number  of  unknowns  in  the  problem,  most  of  them 
deal  with  restricted  conditions  such  as  planarity  of  the  scene  [1,  5]  or  nonbiasedness  of  the 
estimators  [5,  24].  Although  these  analyses  are  deep  and  complex,  notably  absent  in  all  of 
them  is  an  account  of  the  systematic  nature  of  the  errors  in  the  depth  estimates  that  result 
from  errors  in  the  motion  estimates.  In  other  words,  the  highly  correlated  nature  of  the  depth 
errors  at  different  spatial  locations  is  not  reflected  adequately  in  these  analyses.  While  [19] 
attempts  to  capture  such  systematic  relationships  using  an  error  covariance  matrix  of  size 
9 n2  for  n  3D  points,  the  representation  used  there  does  not  lend  to  a  clear  understanding  of 
the  relationship  involved.  Due  to  the  lack  of  such  an  analysis,  the  relationship  between  the 
distorted  surfaces  and  the  true  surfaces  is  not  well  understood,  except  in  the  case  of  critical 
surface  pairs  [12,  15-17],  where  the  relationship  is  explicitly  stated. 

Given  that  a  human’s  estimation  of  3D  motion  is  likely  to  be  imprecise,  the  understand¬ 
ing  of  this  distortion  relationship  is  important  in  explaining  various  geometrical  properties  of 
perceived  visual  space,  especially  with  regard  to  its  non-veridical  aspects.  More  importantly, 
the  distortion  relationship  can  also  be  used  for  studying  the  invariant  aspects  of  perceived 
visual  space.  For  instance,  we  would  like  to  know  if  the  ordinal  relationship  between  the 
depths  of  different  points  [10,  14,  20]  is  preserved.  As  far  as  motion  estimation  is  concerned, 
what  is  of  great  computational  interest  is  those  regions  in  space  where  the  distortions  are 
such  that  the  perceived  depths  become  negative.  At  these  points,  the  visibility  constraint 
is  violated — any  image  point,  being  visible,  cannot  lie  behind  the  camera.  The  visibility 
constraint  is  used  by  a  number  of  recent  algorithms  [2,  8,  13,  18]  to  restrict  the  solution  set. 
Since  the  visibility  constraint  is  essential  because  it  is  the  only  approach  to  motion  analysis 
not  relying  on  assumptions  and  heuristics,  it  is  important  to  answer  questions  such  as  the 
sufficiency  of  the  visibility  constraint  (an  issue  currently  vaguely  understood)  and  the  sta¬ 
bility  of  algorithms  that  employ  the  visibility  constraint.  These  questions  can  be  addressed 
by  studying  the  negative  distortion  region.  In  this  paper,  we  propose  a  new  framework,  the 
iso-distortion  surfaces,  to  capture  the  distortion  relationship  and  address  the  aforementioned 
issues.  We  also  examine  questions  such  as  the  uniqueness  of  normal  flows  with  respect  to 
the  motion  that  they  describe,  the  quantitative  relationship  between  the  field  of  view  of 
the  camera  and  the  accuracy  of  the  translation  estimates,  and  the  accuracy  required  in  an 
inertial  sensor  for  robust  estimation  of  translation. 
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This  paper  is  organized  as  follows.  Section  2  develops  the  equations  leading  to  the 
structure  of  the  iso-distortion  surfaces,  and  describes  some  of  the  geometric  properties  of  the 
distortion  space.  Section  3  is  devoted  to  the  study  of  the  ambiguities  that  could  arise  in  esti¬ 
mating  3D  motion  from  normal  flow  by  exploiting  the  structure  of  the  distortion  space  and, 
in  particular,  its  negative  distortion  subset.  In  this  section  we  also  study  (a)  the  relationship 
between  the  errors  of  the  various  parameters  of  the  3D  motion  that  are  most  likely  to  give 
rise  to  ambiguities  regarding  3D  motion  estimation;  and  (b)  the  disambiguating  power — with 
regard  to  obtaining  3D  motion  solutions — of  a  surface  patch  containing  differently  oriented 
features,  i.e.,  what  we  can  achieve  from  a  local  analysis.  This  second  result  provides  a  quan¬ 
titative  account  of  how  likely  it  is  for  a  moving  observer  to  detect  independent  motion  as 
a  function  of  the  3D  motion  error  (difference  between  egomotion  and  object  motion)  and 
the  depth  of  the  scene  in  view.  Section  4,  using  the  insight  gained  in  the  previous  sections, 
studies  whether  one  can  obtain  a  unique  solution  for  3D  motion  using  normal  flow.  The 
analysis  also  shows  that  motion  fields  are  never  ambiguous,  a  result  conjectured  in  [12]  and 
recently  proved  in  [3].  Considering  a  limited  field  of  view,  however,  ambiguities  may  arise. 
An  analysis  of  the  positions  of  the  image  areas  giving  rise  to  ambiguities  is  provided.  Sec¬ 
tion  5,  using  the  theoretical  results  obtained  in  the  previous  sections,  develops  constraints 
on  the  field  of  view  for  accurate  3D  motion  estimation  and  gives  a  quantitative  analysis  of 
the  precision  needed  by  an  inertial  sensor  for  accurate  3D  motion  estimation  (the  inertial 
sensor  estimates  rotation). 

2  The  Iso-Distortion  Framework 

To  characterize  the  distortion  of  depth  due  to  erroneous  motion  estimates,  we  consider  those 
points  in  3D  space  whose  estimated  depth  Z  would  be  distorted  by  the  same  multiplicative 
factor  D: 

Z  =  DZ 

The  locus  of  such  points  constitutes  a  surface,  which  we  call  an  iso-distortion  surface.  To 
facilitate  the  pictorial  description  of  these  surfaces  in  the  following  section,  we  slice  them  with 
planes  parallel  to  the  x-Z  plane.  The  curves  thus  obtained  we  call  iso-distortion  contours. 

2.1  Technical  Prerequisites 

We  adopt  the  standard  model  for  image  formation,  as  illustrated  in  Figure  1,  with  ( U ,  V,  W) 
and  (a,  /?,  7)  representing  respectively  the  translation  and  the  rotation  of  the  observer  in  the 
coordinate  system  OXYZ.  As  a  consequence  of  the  well-known  scale  ambiguity,  only  the 
focus  of  expansion  or  FOE  (ar0,t/0),  given  by  (^,  ^),  the  rotational  parameters  (a,  /?,7), 
and  the  scaled  depth  ^  are  obtainable  from  flow  information.  Without  loss  of  generality, 
we  can  set  W  =  1;  henceforth  Z  shall  represent  the  scaled  depth,  unless  explicitly  noted 
otherwise.  In  the  derivation  that  follows,  we  assume  that  these  five  motion  parameters  have 
been  estimated,  though  probably  with  some  errors.  Our  focus  is  on  the  depth  estimation 
stage,  where  the  goal  is  to  describe  the  distortion  in  depth  as  a  function  of  the  motion  errors. 
We  rearrange  the  familiar  flow  equation  so  that  the  quantity  of  interest,  the  depth  Z,  is  on 
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Figure  1:  The  image  formation  model.  OXYZ  is  a  coordinate  system  fixed  to  the  camera.  0  is  the 
optical  center  and  the  positive  Z-axis  is  the  direction  of  view.  The  image  plane  is  located  at  a  focal 
length  /  pixels  from  0  along  the  Z-axis.  A  point  P  at  (X,  Y,  Z)  in  the  world  produces  an  image 
point  p  at  (x,  y)  on  the  image  plane  where  (x,  y )  is  given  by  £f).  The  instantaneous  motion 
of  the  camera  is  given  by  the  translational  vector  (U,  V ,  W )  and  the  rotational  vector  (a,f3, 7). 


the  left  hand  side: 

Zz  {x-xQ,y-yO)-(nx,ny) 

un-uT  ■  (nx,ny)  ' 

Here  ( nx,ny )  denotes  the  direction  of  the  vector  along  which  normal  flow  is  measured;  de¬ 
pending  on  circumstances,  we  may  also  write  it  as  (cos  6,  sin  9).  un  is  the  magnitude  of 

the  optic  flow  projected  in  direction  (nx,  ny),  and  ur  is  the  rotational  flow  vector,  given  by 

—  /?(y  +  /)  +  72/ )<*(  y-  +  /)  ~  ~  ix).  To  refer  to  the  various  estimates,  we  use 

the  hat  sign A  to  represent  estimated  quantities,  and  subscript  e  to  represent  errors.  We  also 
allow  for  a  noise  term  N  in  the  estimate  for  normal  flow  un: 

(sO,t/0)  =  (xO  —  xOe,yO  —  yOe) 

=  (a-ae,/?-&,7-7e) 
uT  =  ur  —  uTe 

y>n  ~  ^ n  T  N 


Bringing  the  various  estimated  quantities  into  (1),  the  computed  depth  becomes 

^  {x  y  yo)  ■  iP'xi  Hy) 

ur  *  (n^,,  Tty  j 

Writing  as  x  -  xO,y  -  yO)  +  ur )  •  (nx,  ny)  +  N,  we  obtain 


Z  =  Z 


(x  -  xO)  nx+  (y  -  t/0)  rty 

(x  xO,  y  2/0)  *  (^xj  ^y)  "h  ^ y  Z  T  N Z 


(2) 

(3) 
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From  equation  (3),  we  can  see  that  Z  is  distorted  by  a  multiplicative  factor  given  by  the 
term  inside  the  brackets,  which  we  denote  as  D ,  the  distortion  factor.  Further  denoting  the 
components  of  uTe  as  (u*  ,  u%J,  we  can  write  D  as  follows: 

D  _ _ (g  -  jo)  Ux  +  (y  -  yp)  ny _ 

(x  -  xO  +  nx+  (y-y 0  +  u^Z'j  ny  +  NZ 

Ignoring  the  noise  term  for  the  moment,  we  see  that  for  any  fixed  (nx,  ny)  and  fixed  distortion 
factor  Dy  the  above  equation  is  of  the  form  Z  =  f(x,y )  and  therefore  defines  a  surface  in  the 
3D  space,  which  we  term  the  iso-distortion  surface.  Henceforth,  when  we  talk  about  a  family 
of  iso-distortion  surfaces,  it  is  always  with  respect  to  a  certain  direction  (nx,  ny)  defined  at 
every  image  point.  It  is  important  to  realize,  on  the  basis  of  the  preceding  analysis,  that 
the  distortion  of  depth  is  different  for  different  directions  on  the  image  plane  where  flow  is 
estimated!  This  simply  means  that  if  one  estimates  depth  from  optic  flow  in  the  presence 
of  errors,  the  results  may  be  very  different  depending  on  whether  the  horizontal  or  vertical 
component  (or  any  other  component)  is  used! 

In  order  to  obtain  the  iso-distortion  surfaces  in  3D  space  (i.e.,  XYZ  space)  instead  of 
visual  space  we  substitute  x  —  (i.e. ,  xyZ  space)  and  y  =  fj  in  equation  (4).  This  gives 

D  ((< yeXY  -  0e  (x2  +  Z2)  +  jeYZ)  nx  +  (ae  (Y2  +  Z2)  -  f3eXY  -  leXZ)  ny)  - 

(— T2— (— T2))  —  « 

Equation  (5)  describes  the  iso-distortion  surfaces  as  quadratic  surfaces,  in  the  general  case 
hyperboloids.  An  illustration  is  given  in  Figure  2.  However,  for  most  of  the  analysis  con¬ 
ducted  in  this  paper,  we  study  the  surfaces  (and  contours)  in  visual  space. 

To  throw  more  light  on  the  nature  of  these  surfaces,  we  first  consider  their  intersections 
with  planes  parallel  to  the  x-Z  plane,  and  look  at  the  resultant  iso- distortion  contours.  After 
understanding  the  geometrical  properties  of  the  iso-distortion  contours,  we  then  proceed  to 
describe  the  iso-distortion  surfaces  in  3D,  and  how  they  are  related  to  each  other. 

2.2  Iso-distortion  Contours 

In  what  follows,  we  first  perform  a  simplification — to  be  removed  later — which,  though  not 
theoretically  necessary,  will  allow  us  to  better  grasp  the  geometrical  organization  of  the 
iso-distortion  contours.  The  simplification  basically  amounts  to  ignoring  some  terms  in 
(u^,  uyTt)  that  result  in  secondary  effects.  In  particular,  we  assume  that  the  system’s  field  of 
view  (FOV)  is  not  large  and  that  the  contribution  of  7e  (7 ey,  — ^fex)  is  small  compared  to  that 
of  ae  and  fSe,  so  that  (u*e,«^J  becomes  (—/3ef,aef),  denoted  henceforth  as  (—/?/,  ocf).  This 
is  typically  true  of  most  conventional  cameras,  but  as  we  shall  see  later,  when  we  abandon 
this  simplification,  the  effect  on  the  general  shape  of  the  iso-distortion  contours  is  minimal 
anyway.  If  we  now  fix  ( nx,ny )  to  be  in  the  horizontal  direction,  and  again  ignore  the  noise 
term  N  for  the  moment,  we  can  rewrite  (4)  as  follows,  assuming  D  to  be  non-zero: 


Figure  2:  Iso-distortion  surface  in  XY Z- space.  Only  the  part  in  front  of  the  image  plane  is 
shown. 

which  describes  the  iso-distortion  surfaces  as  a  set  of  planes  perpendicular  to  the  x-Z  plane. 
Much  of  the  information  that  equation  (6)  contains  can  thus  be  visualized  by  considering  a 
family  of  iso-distortion  contours  on  a  two-dimensional  x-Z  plane.  Each  family  is  defined  by 
three  parameters:  xO  and  the  two  error  terms  xOe  and  j3j.  Within  each  family,  a  particular  D 
defines  an  iso-distortion  contour.  Figure  3  shows  several  families  of  iso-distortion  contours; 
Figures  3c  and  3d  correspond  to  the  special  cases  of  /?/  =  0  and  xOe  =  0  respectively.  In 
the  next  section,  we  shall  determine  the  salient  geometrical  properties  of  such  simplified 
iso-distortion  contours,  after  which  we  will  re-incorporate  the  terms  that  we  have  ignored. 

2.3  Geometrical  Properties  of  Iso-Distortion  Contour  Plots 
2.3.1  Negative  distortion  regions 

The  depth  estimate  Z  becomes  negative  when  the  distortion  factor  D  is  negative.  On  the 
distortion  plots,  this  negative  region  lies  between  the  D  =  0  line  and  the  D  =  — oo  line, 
represented  as  -INF  on  the  plot.  The  size  of  this  region  determines  the  efficacy  of  the 
visibility  constraint.  As  remarked  earlier,  equation  (6)  is  valid  only  if  D  is  non-zero.  Taking 
this  into  account,  we  derive  the  equations  for  both  the  D  =  0  and  the  D  =  —  oo  contours  as 

follows:  ^ 

x  =  xO 

Z  =  jj{x-xO) 
x  =  xO 


D  =  0 
D  =  — oo 
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if  Pf  ±  0 

otherwise 


x  Ax  T  s  x  Ax I s 


(a)  xO  =  — 50,x0e  =  —50,/?/  =  0.5  (b)  xO  =  -50,x0e  =  -50,/?/  =  0.2 


x  AxTs  x  Axis 


(c)  xO  =  — 50,  x0e  =  —50,/?/  =  0.0  (d)  xO  =  — 50,x0e  =  0 , /?/  =  0.2 

Figure  3:  Families  of  iso-distortion  contours  parameterized  by  xO,  x0e  and  /?/.  The  number 
beside  each  contour  denotes  the  distortion  factor  D  of  that  contour.  INF  denotes  oo.  The 
contours  are  spaced  D  =  0.2  apart,  except  at  the  region  where  the  magnitude  of  D  becomes 
very  large.  The  —INF  and  INF  contours  coincide. 

The  D  —  0  contour  is  a  vertical  line,  whereas  the  D  —  —  oo  line  has  slope  given  by  jj,  and 
x-intercept  given  by  xO.  Therefore,  for  the  case  where  /?/  =  0,  the  negative  distortion  region 
is  a  vertical  band  defined  by  x  =  xO  and  x  =  xO  (see  Figure  3c).  As  /?/  deviates  from  zero, 
its  effect  is  to  rotate  the  D  =  —oo  contour  while  pivoting  about  the  x-intercept  xO.  The  size 
of  the  negative  distortion  region  changes  accordingly. 
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2.3.2  The  D  =  1  contours  are  horizontal 


This  is  the  contour  on  which  the  depth  estimates  are  not  distorted.  The  equation  of  the 
contour  is  given  by  Z  —  —  ^f-.  Together  with  the  D  —  0  vertical  line,  it  divides  the  x-Z  plane 
into  four  regions.  Ignoring  the  negative  distortion  regions,  we  can  roughly  characterize  two 
of  them  as  having  the  effect  of  overestimating  depth,  and  two  of  them  as  underestimating 
depth.  If  the  location  of  xO  is  well  outside  the  image,  so  that  the  D  =  0  contour  is  not  in 
view,  then  the  x-Z  plane  is  divided  by  the  D  =  1  contour  into  two  half-planes.  Depths  on 
one  side  of  the  contour  are  subjected  to  either  contraction  only  or  expansion  only  (and  vice 
versa  for  the  other  side),  depending  on  the  errors  in  the  motion  estimates.  The  point  where 
the  D  =  0  and  the  D  =  1  contours  meet  also  defines  the  common  intersection  point  of  all 
the  distortion  contours.  At  this  point,  the  distortion  factor  is  undefined. 


2.3.3  Flows  in  Other  Directions 

The  previous  analysis  has  considered  the  case  of  horizontal  flow.  Now,  we  extend  the  analysis 
to  any  arbitrary  direction  (nx,  ny).  With  our  simplification,  this  is  particularly  easy  to  handle. 
By  substituting 

xO'  =  (xO,  2/0)  •  (nx,ny) 

=  (x0e,y0e)  ■  {nx,ny) 
x'  =  (x,y)  •  (nx,ny) 

P'j  =  (/?/,“<*/)  •  {nx,ny) 


we  get  for  any  direction 


D-  lx'  1  (x  O'  (D  -  l)x0'\ 

D  ft  "  D  (  p'f  +  ft  ) 


Jj  ^  \Pf  P's 

which  has  the  same  form  as  equation  (6).  Therefore,  each  direction  exhibits  its  own  iso¬ 
distortion  pattern,  characterized  by  its  xO',  x0'e  and  /3'j. 


2.4  Effects  of  FOV,  7e,  and  Noise 

The  results  of  the  previous  section  were  derived  under  the  assumptions  that  the  FOV  is  not 
large  and  ye  is  small.  We  now  consider  the  effects  of  these  assumptions  on  the  iso-distortion 
contours. 

•  FOV:  Figure  4  presents  two  iso-distortion  plots,  characterized  by  the  same  triplet 
(x0,a:0e,/?e)  that  defines  Figure  3a.  Figure  4a  has  a  FOV  of  50°,  whereas  Figure  4b 
has  a  FOV  of  70°.  It  can  be  observed  that  the  iso-distortion  contours  become  curved 
at  the  periphery  of  the  image,  notably  in  Figure  4b.  The  above  qualifications  notwith¬ 
standing,  the  topology  of  the  contours  remains  the  same,  so  that  many  of  the  remarks 
made  in  Sections  2.3.1  and  2.3.2  are  applicable  with  few  modifications. 

•  Noise:  The  effect  of  noise  is  to  change  the  /?/  term  in  equation  (6)  to  /?/  —  AT;  thus  we 
obtain 

D  —  1  x  1  (  x 0e  .  (D-l)x0\ 


D  pf-N  D\Pf-N+  pj-N 


(7) 
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(a)  FOV=50°  (b)  FOV=70° 

Figure  4:  Effects  of  wide  field  of  view  on  the  iso-distortion  contours  of  Figure  la.  xO  = 
— 50,  x0e  =  —50  and  /3e  ~  0.001.  The  different  FOVs  of  (a)  and  (b)  are  achieved  by  using 
different  focal  lengths:  500  for  FOV  =  50°  and  350  for  FOV  =  70°. 

Therefore,  flows  with  different  noise  have  different  iso-distortion  contours.  The  cor¬ 
responding  effects  on  the  shape  of  the  negative  distortion  region  will  be  investigated 
in  Section  3.1.2,  where  we  consider  the  noise  sensitivity  of  motion  estimators  that  are 
based  on  the  visibility  constraint. 

•  7e:  In  the  horizontal  direction,  the  contribution  of  7e  is  the  term  7 ey.  Since  y  is  fixed 
on  any  particular  horizontal  plane  where  we  view  the  iso-distortion  contours,  the  effect 
of  7e  is  to  increase  or  decrease  3j  by  a  constant  amount  to  flf  —  yey.  In  other  words,  the 
iso-distortion  contours  in  different  horizontal  planes  are  governed  by  different  rotational 
components  “/?/”  due  to  the  different  y’s,  but  within  a  particular  horizontal  plane,  the 
“/?/”  rotational  term  is  fixed.  The  effect  is  better  visualized  in  3D  space,  which  we 
turn  to  in  the  next  subsection. 

2.5  Iso-distortion  Surfaces 

Figures  5a  and  5b  illustrate  two  iso-distortion  surfaces,  corresponding  to  D  =  0.4  and 
D  =  1.8  respectively,  and  with  respect  to  the  gradient  (nx,ny)  =  (1,0).  The  same  triplet 
(xO,  x0e,/?e)  as  in  Figure  4b  was  used,  and  ae  and  7e  were  set  to  zero.  The  effects  of  ae  and 
7e  were  respectively  introduced  in  Figures  5c  and  5d,  and  the  D  =  1.8  iso-distortion  surface 
was  shown.  Comparing  Figure  5c  to  Figure  5b,  we  can  see  that  the  effect  of  the  ae^j-  term 
on  the  distortion  surface  is  quite  small,  whereas  comparing  Figure  5d  to  Figure  5b,  we  can 
see  that  the  effect  of  7 ey  is  quite  considerable.  For  Figure  5d,  the  effects  of  a  changing  “/?/” 
are  quite  evident,  namely,  an  ever-changing  slope  and  a  changing  intercept  in  different  x-Z 
planes. 
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(c)  D  =  1.8,  (ae,  &,  7e)  =  (0.001, 0.001, 0.0)  (d)  D  =  1.8,  (ae,  &,  7e)  =  (0.0, 0.001, 0.001) 

Figure  5:  Iso-distortion  surfaces  for  gradients  in  the  x  direction.  xO  =  —50;  x0e  =  —50; 
FOV=70°.  Note  that  the  x-Z  planes  as  presented  in  the  plots  become  vertical  rather  than 
horizontal. 

3  What  can  the  Iso-distortion  Framework  Tell  us? 

The  framework  introduced  here  can  clearly  be  used  to  discover  aspects  of  the  distorted  space 
that  remain  invariant  with  regard  to  a  group  of  transformations  and  thus  to  discover  robust 
shape  representations.  Here,  however,  we  restrict  ourselves  to  a  number  of  computational 
issues  related  to  3D  motion  estimation.  Specifically,  we  address  the  problem  of  motion 
estimation  using  normal  flow  (the  movement  of  the  image  along  a  direction  normal  to  the 
intensity  gradient)  [8,  13].  The  basic  tenet  of  the  normal  flow  approach  (also  known  as  the 
direct  approach)  is  that  local  measurement  of  movement  in  the  image  is  not  sufficient  to 
determine  the  movement  of  the  corresponding  point  in  space,  and  therefore  only  normal  flow 
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should  be  used,  from  which  motion  is  inferred  directly  without  computing  correspondence  or 
optic  flow.  These  approaches  rely  on  the  visibility  constraint,  that  is,  the  positivity-of-depth 
constraint,  to  find  the  location  of  the  FOE  in  the  image  plane.  Therefore  it  is  important 
to  understand  the  properties  of  the  visibility  constraint.  With  regard  to  the  iso-distortion 
surfaces,  the  region  where  the  visibility  constraint  is  violated  corresponds  to  the  region 
where  the  distortion  factor  is  negative.  Therefore  geometrical  knowledge  of  this  negative 
distortion  region  can  be  used  as  a  basis  on  which  computational  issues  can  be  studied.  By 
understanding  the  geometry  of  the  negative  distortion  region,  we  make  explicit  the  situations 
that  allow  motion  estimation  based  on  the  visibility  constraint  to  succeed.  We  also  study 
related  aspects  such  as  the  robustness  of  such  methods. 

Questions  arising  from  the  use  of  the  visibility  constraint  in  motion  estimation  are 
examined  in  Section  3.1.  The  uncertainty  in  the  solution  is  related  to  the  negative  distortion 
region.  Section  3.1.3  makes  explicit  situations  that  may  give  rise  to  ambiguities  and  Section  4 
is  devoted  to  a  uniqueness  analysis  of  normal  flow.  In  Section  5  the  analysis  is  related  to 
some  practical  implications,  such  as  the  accuracy  required  of  inertial  sensors  for  accurate 
FOE  estimation.  Finally,  in  Section  6,  experiments  are  carried  out  to  determine  the  extent  to 
which  a  low-cost  inertial  sensor  is  effective  for  FOE  estimation.  For  much  of  the  discussion, 
it  suffices  to  use  families  of  iso-distortion  contours  as  representations  of  the  iso-distortion 
surfaces.  Furthermore,  when  generality  is  not  lost,  the  simplified  model  is  used;  that  is,  the 
second-order  effects  in  the  rotational  error  flows  are  ignored. 

3.1  Properties  of  the  Visibility  Constraint 

In  this  section,  we  deal  with  various  computational  aspects  of  using  the  visibility  constraint 
as  a  basis  for  motion  estimation.  First,  the  robustness  aspect  is  addressed  in  Section  3.1.1 
and  Section  3.1.2.  Next,  we  describe  in  an  intuitive  manner  situations  that  are  likely  to  lead 
to  ambiguities;  specifically,  we  consider  two  factors:  types  of  motion  errors  and  the  location 
of  a  surface  patch.  In  Section  4  we  study  the  problem  of  how  the  gradient  distribution  affects 
the  uniqueness  issue. 

3.1.1  The  Localization  of  the  FOE 

In  the  work  by  Horn  and  Weldon  [13],  the  problem  of  determining  the  FOE  given  a  known 
rotation  is  examined.  Suppose  the  known  rotation  is  subtracted  from  the  normal  flow;  the 
remaining  normal  flow  corresponds  to  that  arising  from  a  purely  translational  flow  field.  By 
exploiting  the  visibility  constraint,  the  problem  of  FOE  estimation  can  be  converted  to  a 
constrained  optimization  problem.  The  basic  underlying  idea  is  illustrated  in  Figure  6.  It  is 
clear  that  if  un  contains  only  translational  flow,  the  FOE  must  lie  in  the  shaded  half  plane 
defined  by  the  line  e.  Thus  every  point  in  that  half  plane  receives  a  vote  for  being  the  FOE. 
The  best  solution  for  the  FOE  corresponds  to  the  location  with  the  highest  number  of  votes. 

When  the  flow  field  contains  rotation  that  is  unknown,  or  that  cannot  be  estimated 
accurately,  the  aforementioned  method  must  be  modified  [2,  18].  In  the  work  of  Aloimonos 
et  al.  [2],  it  was  suggested  that  any  normal  flow  whose  magnitude  is  less  than  a  threshold 
T  is  discarded.  The  threshold  T  is  chosen  to  ensure  correctness  of  voting.  This  modified 
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Figure  6:  The  translational  flow  vector  ut  has  its  tip  anywhere  along  the  line  e2-  The  focus  of 
expansion  lies  on  the  (shaded)  half  plane  defined  by  the  line  e  that  does  not  contain  possible 
vectors  ut. 

voting  scheme  gives  rise  to  a  solution  area  whose  size  increases  with  the  magnitude  of  the 
unaccounted-for  rotation,  and  whose  shape  is  anisotropic  about  the  true  FOE.1 

Those  image  points  at  which  the  visibility  constraint  is  violated  correspond  to  the 
negative  distortion  regions  on  the  iso- distortion  plots.  Thus,  the  anisotropy  of  the  uncertainty 
area  can  be  understood  in  terms  of  the  geometry  of  the  negative  distortion  region,  as  we  show 
below.  Figure  7  depicts  the  negative  distortion  areas  of  several  FOE  candidates,  each  with 
different  amounts  of  error  xOe.  The  negative  distortion  region  is  bounded  by  two  contours: 
the  D  =  —  oo  and  the  D  =  0  contours,  whose  equations  are  given  by  Z  —  Jj(x  —  xO ) 

and  x  =  xO  respectively.  The  unaccounted-for  rotational  flow  in  all  four  cases  is  given  by 
the  same  (3j.  As  a  consequence,  the  equation  for  the  D  =  —  oo  contour  is  the  same  for 
all  FOE  candidates.  The  equation  for  the  other  bounding  contour,  the  D  =  0  contour,  is 
different,  as  it  is  given  by  x  —  xO.  The  resultant  negative  distortion  regions,  represented  by 
the  shaded  areas  in  Figure  7,  clearly  indicate  that  the  correct  candidate  (Figure  7b)  does 
not  have  the  smallest  negative  distortion  region;  therefore  usually  the  correct  xO  will  not 
be  estimated.  Furthermore,  the  anisotropic  nature  of  the  uncertainty  area  is  also  evident 
from  the  figure;  in  this  particular  case  the  estimation  for  xO  is  skewed  towards  the  side  with 
positive  xOe.  In  general,  the  further  away  the  scene  is,  the  larger  the  estimation  error  will 
be.  A  complete  analysis  must  take  into  account  the  normal  flow  measurements  obtained 
from  all  gradient  directions.  As  we  shall  see  in  Section  3. 1.3.2,  if  the  scene  in  view  is 
approximately  fronto-parallel,  then  the  preceding  observation  still  holds  even  if  we  utilize 
normal  flow  measurements  from  all  gradient  directions. 

By  examining  the  geometry  of  the  negative  distortion  areas,  one  can  partially  compen¬ 
sate  for  the  estimation  error  if  one  has  some  knowledge  about  how  the  near  and  far  scene 
distances  are  approximately  distributed  in  the  image.  This  is  accomplished  by  noting  the 
fact  that  the  D  =  ±oo  contours  (they  are  coincident)  pass  through  xO  on  the  iso-distortion 
plots;  thus  by  observing  how  the  very  large  positive  Z’s  and  the  very  large  negative  Z’s 
are  juxtaposed  with  respect  to  each  other  in  the  near  ground,  one  can  at  least  obtain  the 

Sinclair  et  al.  [18],  while  performing  a  similar  analysis,  reached  the  different  conclusion  that  the  uncer¬ 
tainty  area  will  be  anisotropic  only  if  the  FOE  is  outside  the  image;  otherwise,  the  area  is  isotropic  about  the 
true  FOE.  This  discrepancy  stems  from  their  assumption  that  the  pruned  region  (that  is,  the  region  whose 
normal  flows  are  not  discarded)  is  still  centered  around  the  FOE,  which  in  general  is  not  true. 
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x-Axis  x-Axis 


(a)  xOe  =  -50 


(b)  xOe  =  0 


(c)  xOe  =  50  (d)  xOe  =100 

Figure  7:  The  shaded  area  represents  the  negative  distortion  band.  Errors  in  the  FOE 
estimates  are  expected  since  (b),  the  correct  candidate,  does  not  have  the  smallest  negative 
distortion  area.  /?/  =  —  0.5,  =  50,  zO  =  a:0  —  a;0e. 

direction  of  the  FOE  error.  There  remains,  however,  the  difficulty  connected  with  noise, 
which  we  deal  with  in  the  next  section. 

3.1.2  Effect  of  Noise 

We  shall  show  in  this  section  that  even  if  a  flow  field  contains  no  rotation,  the  presence  of  noise 
considerably  reduces  the  robustness  of  the  voting  method.  This  may  be  expected,  but  what 
is  not  so  obvious  is  that  the  negative  distortion  regions  of  some  incorrect  FOE  candidates 
may  not  increase  in  size  at  all  (probabilistically  speaking).  For  illustration  purposes,  suppose 
the  noise  N  has  the  following  probability  mass  function: 

P(JV  =  iVmax)  =  § 

P(Ar=-iVmax)  =  § 
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Figure  8  illustrates  the  negative  distortion  regions  of  two  FOE  candidates  under  various 
noise  conditions  and  zero  rotational  error,  with  the  upper  row  corresponding  to  that  of  the 
correct  FOE  estimate.  Recall  from  equation  (7)  that  the  effect  of  noise  N  is  to  replace  the 
/?/  term  by  /3f  —  N;  in  this  case,  the  slope  of  the  D  =  —  oo  contour  is  simply  given  by  -\j, 
since  the  case  under  consideration  contains  no  unaccounted-for  rotation.  The  expected  area 
of  the  negative  distortion  region  is  calculated  using  the  given  probability  mass  function  of 
the  noise  N  and  its  size  is  shown  at  the  end  of  each  row.  By  comparing  these  figures  to 
the  negative  distortion  regions  when  there  is  no  noise  (Figures  8a,  d),  we  see  that  the  effect 
of  noise,  while  introducing  negative  depth  estimates  for  the  correct  FOE  candidate,  does 
not  result  in  an  increase  in  the  size  of  the  negative  distortion  region  of  the  incorrect  FOE 
candidate. 


(a)  x0e=0,  N=0  (b)  x0e=0,  N=Nmax  (c)  x0e=0,  N=-Nmax 


Z-Axis  Z-Axis  Z-Axis 


xO  xb  xO  xb  xO  xb 

x-Axis  x-Axis  x-Axis 

(d)  x0e=-50,  N=0  (e)  x0e=-50,  N=Nmax  (f)  x0e=-50,  N=-Nmax 


Figure  8:  The  “equivalent”  negative  distortion  regions  of  two  FOE  candidates  under  varying 
conditions  of  noise.  The  upper  row  represents  the  negative  distortion  regions  of  the  correct 
candidates.  The  shaded  figures  at  the  end  of  each  row  represent  the  expected  sizes  of  the 
negative  distortion  regions  of  the  respective  candidates  given  the  probability  mass  function 
P(N).  Note  that  they  have  nothing  to  do  with  the  actual  shapes  of  the  negative  distortion 
regions. 

The  form  of  the  noise  distribution  is  not  a  matter  of  primary  importance  here;  our  aim 
is  to  demonstrate  that  the  visibility  constraint  is  susceptible  to  the  influence  of  confounding 
factors  such  as  the  distribution  of  the  scene  features  and  the  statistical  nature  of  the  noise. 
Therefore  its  robustness  should  be  suspect.  A  further  point  can  be  made  from  Figure  8:  For 
the  correct  FOE  candidate  the  positivity  of  distant  points  is  more  susceptible  to  noise.  This 
is  related  to  the  observation  that  the  localization  of  the  FOE  improves  with  the  magnitude 


13 


of  the  translational  flow. 


3.1.3  Situations  Giving  Rise  to  Ambiguities 

The  purpose  of  this  section  is  to  make  explicit  those  situations  under  which  the  visibility 
constraint  may  not  be  sufficient  to  discriminate  between  alternative  motion  solutions.  The 
factors  that  give  rise  to  ambiguity  are  examined  from  two  perspectives:  first,  the  types  of 
motion  errors;  second,  the  influence  of  the  location  of  a  surface  patch.  A  more  rigorous 
analysis  of  the  uniqueness  issue  is  not  attempted  until  Section  4,  where  we  try  to  formally 
account  for  the  gradient  distribution. 

3.1.3. 1  Types  of  Motion  Errors 

We  first  establish  a  condition  on  the  motion  errors  that  most  likely  leads  to  ambiguities. 
It  follows  from  the  observation  that  if  the  iso-distortion  diagram  is  such  that  the  point  at 
which  its  D  =  0  contour  and  its  D  =  —  oo  contour  intersect  lies  behind  the  image  plane,  then 
irrespective  of  the  actual  scene  structure,  negative  depth  estimates  will  always  be  obtained, 
since  for  some  image  point  P  (see  Figure  9a),  the  space  in  front  of  the  image  region  is 
entirely  spanned  by  the  negative  distortion  area.  However,  if  the  motion  errors  are  such 
that  this  intersection  point  lies  in  front  of  the  image  plane  for  all  gradient  directions,  the 
likelihood  of  ambiguity  increases  considerably.  Therefore  the  conditions  on  the  motion  errors 
for  ambiguity  can  be  stated  as,  first,  7e  =  0;  and  second,  the  ^-coordinate  (the  absolute 
depth)  of  the  point  at  which  the  D  =  0  contour  and  its  D  =  -oo  contour  intersect  should 
be  such  that 

(x0e,y0e)  •  (cos 0, sing)  V0 

(/?/,—  af)  ■  (cos  0,  sin  0) 

This  condition  can  be  satisfied  only  if  W(x$e,  y0e)  =  — A(/?y,  -af)  for  some  positive  A.  This 
constitutes  our  condition  on  the  type  of  motion  error  that  most  likely  gives  rise  to  ambiguity. 

Given  this  condition,  if  the  scene  in  view  is  such  that  it  avoids  the  negative  distortion 
region  for  all  image  points  (for  instance,  point  Q  in  Figure  9b  avoids  the  negative  distortion 
area),  then  ambiguities  arise.  The  regions  within  which  point  Q  can  reside  so  as  to  yield 
ambiguities  can  be  obtained  directly  from  Figure  9b  by  using  the  equations  for  the  D  =  0 
and  the  D  —  — oo  contours.  They  are  written  in  their  general  form  for  arbitrary  gradient 
direction  6,  so  that  independent  of  the  gradient  direction,  no  negative  depth  values  occur: 

sgn (0'f)Z  <  SSP^-  ((*  -  xO)  cos  9  +  (y  -  yO)  sin  0)  if  (x  -  xO)  cos  0  +  (y  -  yo)  sin  9  >  0  (8) 

sgn  (/?})  Z  >  ( (x-xO)  cos  0  +  (y  -  yO)sin0)  if  (x  -  xo)  cos  9  +  (y  -  yo)  sin  9  <  0  (9) 

where  we  have  used  /9y  to  denote  (/3j  cos  9  —  otf  sin  6)  and  sgn(-)  to  denote  the  sign  function. 
As  long  as  the  scene  in  view  satisfies  these  constraints,  no  negative  depth  estimates  will  arise. 
In  actual  fact,  the  scene  in  view  is  not  composed  of  spatially  unrelated  elements  where  Z 
can  vary  freely  to  meet  the  constraints.  A  scene  patch,  with  gradients  in  different  directions, 
and  therefore  experiencing  different  constraints  on  depth,  may  not  be  capable  of  meeting  all 
of  them  owing  to  its  smoothness  property.  The  likelihood  of  such,  a  scene  patch  giving  rise  to 
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(b)  xOe  =  50 


Figure  9:  The  iso-distortion  diagram  on  the  left  will  always  result  in  negative  depth  estimates, 
regardless  of  the  actual  depth  distribution.  This  is  so  because  the  whole  region  in  space 
between  x  =  xO  and  x  =  zO  is  spanned  by  the  negative  distortion  region. 


negative  depth  estimates  depends  on  a  number  of  factors,  which  we  investigate  in  the  next 
section. 


3. 1.3. 2  Using  a  Patch 

In  this  section,  we  study  the  disambiguating  power — with  regard  to  obtaining  motion  solu¬ 
tions — of  a  smooth  surface  patch,  which  contains  differently  oriented  features.  The  analysis 
also  provides  a  quantitative  assessment  of  how  likely  it  is  for  a  moving  observer  to  be  able 
to  detect  independent  motion,  as  a  function  of  the  parameters  of  the  problem.  As  discussed 
in  Section  2.3.3,  each  gradient  direction  is  governed  by  different  families  of  iso-distortion 
surfaces.  Therefore  differently  oriented  features  on  the  surface  patch  are  subject  to  different 
constraints  on  depth  in  order  for  the  distortion  factors  to  be  positive.  We  would  like  to 
know  the  likelihood  of  the  resultant  surface  estimate  yielding  some  negative  depth  values 
if  the  feature  gradients  are  distributed  over  a  sufficiently  wide  range.  This  will  indicate 
regions  on  the  image  plane  in  which  the  aggregate  of  normal  flow  field  measurements  is  most 
effective  in  disambiguating  alternative  motion  solutions.  For  the  purpose  of  quantifying  this 
effectiveness,  we  adopt  the  following  measure:  Given  some  errors  in  the  motion  estimates 
and  a  fixed  small  region  in  space,  what  is  the  range  of  directions  in  which  the  gradient  must 
lie  so  as  to  yield  negative  depth  estimates?  The  larger  this  0  range  is,  the  more  effective  the 
patch.  It  suffices  to  consider  the  range  of  gradients  in  [— |,  |],  since  a  particular  gradient 
direction  n  results  in  the  same  D  as  the  opposite  gradient  direction  —  n. 

The  disambiguating  power  of  a  scene  patch  depends  on  a  number  of  factors  such  as 
the  image  location  of  the  patch,  the  actual  depth  of  the  patch,  and  the  range  of  gradient 
directions  on  the  patch.  For  instance,  with  regard  to  gradient  directions,  we  note  that  they 
are  most  effective  around  the  direction  at  which  the  bound  on  Z  (expressions  (8)  and  (9)) 
changes  from  an  upper  bound  to  a  lower  bound.  If  the  gradients  of  the  patch  happen  to  lie 
on  either  side  of  this  direction,  the  depth  of  the  patch  will  be  constrained  to  lie  in  the  range 
given  by  the  upper  and  the  lower  bound.  Referring  to  (8)  and  (9),  the  direction  is  given 
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by  ( x  —  ad))  cos#  +  (y  —  yO)  sin  9  =  0.  Unfortunately,  depths  are  usually  not  computed  for 
such  directions,  since  being  perpendicular  to  lines  emanating  from  the  estimated  FOE,  these 
gradients  are  deemed  to  carry  no  depth  information.  In  spite  of  this  fact,  if  the  range  around 
the  direction  within  which  negative  depth  estimates  arise  is  large,  then  the  patch  becomes 
potentially  useful  in  disambiguating  motion  solutions. 

Since  our  aim  here  is  to  obtain  an  intuitive  picture  of  how  the  size  of  this  range  varies 
over  the  image  plane,  we  treat  the  rotational  error  flows  due  to  ae  and  fie  as  constant.  The 
analysis  is  further  aided  by  rotating  the  x-y  coordinate  system  so  that  in  the  new  coordinate 
system,  there  is  no  error  in  the  y-component  of  the  rotational  estimate,  i.e.,  ae  =  0.  The 
distortion  factor  can  now  be  written  as 

_ x  -  xO  +  (y  -  yO)  tan  # _ 

x  —  xO  —  (/?/  —  7 ey)Z  +  (y  —  y0  —  7  exZ)  tan  # 


We  denote  the  numerator  and  denominator  by  h{9)  and  k(9 ),  and  the  angles  9  at  which 
they  become  zero  as  9h  and  9k  respectively.  The  distortion  factor  D  becomes  negative  when 
h(9)  and  k(9)  have  different  signs.  Referring  to  Figure  10,  we  have  plotted  h{9)  and  k(9)  as 
functions  of  tan#,  where  9  ranges  from  -f  to  f.  It  can  be  observed  that  the  size  of  the  9 
range  within  which  h(0)  and  k(9)  have  different  signs  can  be  expressed  as  follows: 


R  = 


|  9h  —  9  k  | 

7T-  |  9h  -  9k  | 


if 


dh{9) 


d  tan  9 
otherwise 


<9fc(#) 

d  tan# 


>0 


(10) 


where  R  is  the  size  of  the  interval.  To  compute  R,  for  convenience  of  notation  we  let 
t  =  tan (9h  -  9k )•  Then  the  angle  tan-1 1,  ranging  from  -x  to  x,  is  given  by 


tan  1 1  =  tan  1 


—  (a:  —  zO) 

y  ~  2/0 


_i  ~(x  -  ad)  -  (0f  -  JeV)Z) 
tan  y-y0-  lexZ 


Taking  the  tangent  of  both  sides,  we  obtain 

t  ((y  -yd)  (y-y0- 7 exZ)  +  (x  -  ^6)  (x  - x0  - ((3f  - 7 ey) z)) 
-(x-x0-  (pf  -  7ej/)  Z)  (y  -  y0)  +  (x  -  xO)  (y  -  yO  -  7 exZ)  =  0 

Introducing  the  following  coordinate  translation: 


a:0e  --  zOe 

^old  ~  ^new  “I”  ^  *^new  ^ 

y0e  -7-  y  0e 

?/old  =  J/new  T  2/0  ~  2/new  "f"  2/0  T  ^ 


(ID 


(12) 


and  doing  away  with  the  subscripts  in  x  and  y,  we  finally  obtain  the  following  equation: 

c t  -  7 eZ)x2  +  (t-  leZ)y 2  -  ( tpfZ  +  y0e  +  ^ )  x  +  ($fZ  +  x0e  + 

tleZx 0e  'jeZyOe \  ,  ( PfZy0e  tpfZx 0e  txQl  ty02e\  , 

—2 - 2  \  2 - 2  4  r)-°  ( 

16 


which  can  be  shown  to  give  equations  of  circles  for  different  values  of  t.2  All  the  circles  pass 


through  the  two  points  (xO,  yO)  and  j  _  jn  particular,  the  t  =  0 

locus  is  a  straight  line  defined  by  the  above  two  points. 


Figure  10:  The  diagram  illustrates  the  ranges  of  0  over  which  D  is  negative,  given  fixed  x,y, 
and  Z.  They  are  indicated  by  the  shaded  regions  on  the  horizontal  axis. 


To  obtain  R,  these  loci  of  t  must  be  modified  according  to  equation  (10): 


R  = 


| tan  1 t  | 

^  7 r—  | tan-1 1 


dhje)  dkje ) 

d  tan  0  d  tan  0 
otherwise 


Figure  11  illustrates  the  case  7e  =  0.  Figure  11a  shows  the  loci  of  tan-1 1,  whereas 
Figure  lib  shows  the  actual  range  R.  The  centers  of  all  the  circles  lie  on  a  straight  line 
perpendicular  to  the  t  =  0  locus.3  Several  points  can  be  made.  First,  there  exists  a  band 
in  the  image  plane  where  R  is  large.  The  direction  of  the  band  depends  on  the  relative 
magnitudes  of  two  terms:  y0e  and  xOe+/3fZ.  Second,  for  a  correct  FOE  candidate  (xO  =  xO), 
the  radii  of  the  circles  may  be  larger  than  for  an  incorrect  FOE  candidate.  Therefore  the 
remarks  made  in  Section  3.1.1  still  apply  even  when  scene  patches  with  differently  oriented 
gradients  are  utilized  in  the  motion  estimation  process.  In  particular,  when  the  motion 
errors  and  the  depth  of  the  patch  are  such  that  xO  =  x0  +  f3fZ,  yO  =  yO  —  ajZ,  and  7e  =  0, 

2The  equation  is  of  the  second  order  type  Ax2  +  Bxy  +  Cy 2  +  Dx  +  Ey  +  F  —  0.  To  show  that  the 
expression  represents  the  equation  of  a  circle,  it  suffices  to  note  that  B  =  0,  A  =  C,  and  D2  +  E2  —  4.4F  = 
(l+f2)(z02  +  y02  +(^"^-°02  +  Z2(/?/  +  ^ —)2+2PfZx0e),  which  can  be  shown  to  be  positive  by  a  standard 
discriminant  check. 

3The  radius  r(t)  and  center  (a(t),b(t))  of  a  circle  describing  locus  t  are  given  in  the  shifted  coordinate 
system  (equations  (12))  by 


(«(«),  H<)) 


=  (¥ 


(l+(!)(»05  +  (iO.+ftZ)!) 
y0e  0jZ  zOe 


2 1 


2 1 


2 1 


The  centers  of  all  the  circles  lie  on  a  straight  line,  parametrically  described  by  ( a(t),b(t )),  or  equivalently, 


_  /  xOe  +  /?/  Z 

V  ~  V  2/0 


)(*- 


¥) 


,  which  is  perpendicular  to  the  line  defined  by  the  locus  t  =  0. 
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the  circles  collapse.  The  interval  R  becomes  null  everywhere,  unless  one  takes  into  account 
the  quadratic  terms  in  the  rotational  flow.  The  above  conditions  are  readily  satisfied  by  a 
fronto-parallel  scene  and  a  camera  with  a  small  field  of  view.  Therefore  such  a  scenario  is 
likely  to  yield  multiple  solutions. 


Figure  11:  (a)  The  circles  describe  loci  of  t  on  the  image  plane  (7e  =  0).  (b)  The  values  of 
R  are  indicated  by  the  numbers  on  the  circles. 


4  Uniqueness  Analysis 

In  the  previous  sections,  we  examined  the  problem  of  ambiguity  from  various  aspects.  The 
aim  is  to  obtain  an  intuitive  picture  of  various  conditions  that  give  rise  to  multiple  solu¬ 
tions.  In  this  section,  we  shall  formulate  mathematically  the  conditions  under  which  the 
solution  is  unique.  Specifically,  we  ask  the  question:  if  we  are  allowed  to  arbitrarily  choose 
a  gradient  distribution,  can  we  find  a  gradient  field  such  that  the  actual  3D  motion  and  the 
estimated  motion  must  give  rise  to  different  normal  flow  fields,  whatever  the  corresponding 
scenes  in  view  are  (observing  the  positivity  of  depth  constraint,  of  course)?  If  the  answer 
is  affirmative,  then  the  actual  3D  motion  has  a  unique  solution.  The  reason  for  allowing 
an  arbitrary  gradient  distribution  is  to  enable  us  to  secure  a  uniqueness  definition  without 
having  to  assume  any  particular  form  of  gradient  distribution.  However,  it  turns  out  that 
the  consequence  of  demanding  positive  D  in  every  gradient  direction  amounts  to  demanding 
equal  optic  flow  everywhere,  as  we  show  in  the  sequel.  The  same  conclusion  can  be  obtained 
in  an  alternative  way,  as  shown  in  the  appendix. 

The  analysis  proceeds  by  examining  the  constraint  imposed  on  the  scene  depth,  arising 
from  the  condition  that  D  must  be  positive  for  all  gradient  directions  0.  If  in  certain  gradient 
directions  the  constraint  is  impossible  to  meet,  for  instance,  the  depth  is  bounded  from  above 
by  a  negative  value,  then  different  normal  flow  fields  must  arise  along  this  gradient  direction. 
Thus,  this  particular  motion  estimate  can  be  rejected.  To  simplify  the  algebra,  we  rotate  the 
x-y  coordinate  system  so  that  in  the  new  coordinate  system,  y0e  is  zero.  Henceforth,  x0e, 


18 


ae  and  fie  are  with  respect  to  the  new  coordinate  system.  For  convenience  of  presentation, 
we  introduce  the  following  notation  for  the  various  terms  found  in  the  expression  for  D  (4): 

r(9)  =  (/3e  (y-  +  /)  —  _  ley]  +  (— oce  ( +  f)  +  /3e^-  +  7ex)  tan# 

=  —  it®  —  uyr  tan  6 
p(0)  =  x  —  xO  +  (y  —  yO)  tan  6 
q(0)  =  x  —  xO  +  (y  —  yO)  tan  6 

Furthermore,  denote  the  angles  where  p(0),q(9 ),  and  r(0)  become  zero  as  9P,  8q  and  6r  re¬ 
spectively,  that  is 

0, 

dT 

From  this,  it  is  immediately  clear  that  p(9q)  =  — xOe  and  q(0p)  =  xOe.  Disregarding  noise, 
the  expression  (4)  for  D  >  0  becomes 


which  implies  either  of  the  following: 


r(9)Z<p(0)  V0  e  {0|$(0)>O}  '  (15) 


r{9)Z>p{9)  ^9  e  {^U(^)<0}  (16) 

and  from  which  we  obtain  the  constraints  imposed  on  the  scene  depth.  Since  our  present 
goal  is  to  look  for  scene-independent  constraints,  we  are  not  interested  in  the  lower  bound 
on  Z ,  as  such  constraints  can  always  be  satisfied  by  some  scene  points.  Instead,  we  are 
interested  in  situations  where  the  upper  bound  becomes  negative.  Depending  on  the  sign  of 
r{9 ),  both  (15)  and  (16)  can  give  rise  to  upper  bounds  on  Z: 


Z  <  P(0) 

r(9) 

ye  e  {$ 

|  q(9)  >  0  and 

o 

A 

(17) 

Z  <  P ^ 
r(0) 

ye  e  {9 

|  q(9)  <  0  and 

r(0)  <  0} 

(18) 

From  (17)  and  (18),  we  can  state  the  requirement  for  a  negative  upper  bound  to  arise  as 
that  of  finding  a  gradient  direction  9  such  that 

sgn(r(0))  =  sgn  (q(0))  =  -sgn(p(0))  (19) 

Since  §f  =  ||,  p(0)  and  q(0)  have  different  signs  only  when  9  is  between  9P  and  9q  (see 
Figures  12  and  13).  This  satisfies  one  equality  in  (19). 4  With  regard  to  the  necessary 

4For  the  special  case  where  zOe  =  0,  p(9)  and  27 q(9)  always  have  the  same  sign.  In  this  case,  choose 
0  =  0p-,  then  p(9)  =  q($)  =  0,  and  the  upper  bound  becomes  zero,  clearly  an  impossibility.  Henceforth,  we 
will  always  assume  x0e  ^  0. 
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(a)-x0e>0,  0p<0q<0r  (b)-x0e>0,  0p<0r<0q 

Figure  12:  The  three  expressions  p(0),  q(0),  and  r(9)  are  plotted  as  functions  of  tan#.  The 
lightly  shaded  regions  on  the  horizontal  axis  represent  ranges  of  9  within  which  an  upper 
bound  on  Z  applies.  The  darkly  shaded  regions,  if  any,  on  the  horizontal  axis  represent 
ranges  of  6  within  which  this  upper  bound  is  negative.  Here  r(6p)  q(0p)  >  0  holds. 

relationship  of  r(0)  with  p($)  and  q(9),  it  is  sufficient  to  consider  the  relationships  at  the 
extremal  angles  Bp  and  9q.  In  a  neighborhood  around  9P,  we  obtain  values  p{9)  <  0  and 
values  p(6)  >  0.  Thus,  if  r(9p)  q(6p)  >  0,  then  in  some  region  around  6P,  (19)  must  hold.  If 
the  constraint  r(6p)  q(9p)  >  0  does  not  hold,  it  is  equally  admissible  to  have  r(6q)  p(6q)  <  0 
at  9q.  To  summarize,  we  need  either  r(9p )  q(9p)  >  0  or  r(9q)  p(9q)  <  0.  The  two  cases  are 
respectively  illustrated  in  Figure  12  and  Figure  13. 


Figure  13:  Same  plots  as  in  the  previous  figure,  except  that  the  slope  of  r(9)  is  negative. 
Here  r(9q )  p(9q)  <  0  holds. 

Substituting  the  expressions  for  r(9),9p,  and  and  noting  that  p{9q)  =  -xOe  and 
q{9p)  =  x0e,  we  obtain  our  requirements  as  either  of  the  following: 

(-*0.)(-<-<^^)<0  (20) 

(-*0«)  I -  <  l_^Lj  <  0  <21) 

Multiplying  out  the  terms  in  both  (20)  and  (21),  the  requirements  become  either  (22)  or  (23): 

— xQe(y  —  yO)  ( Ax 2  +  Bxy  +  Cy2  +  Dx  +  Ey  +  F)  >0  (22) 

—x0e(y  —  yO)  [Ax2  +  Bxy  +  Cy2  +  Dx  +  Ey  +  P'j  >  0  (23) 
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where 


A  =  7e  +  Sf 
B  -  -  4-  j 

C  =  7e  +  *f- 
D  =  -(aJ  +  'fexO) 

E  =  -(&/  +  7ey0) 

F  =  aefxO  +  PefyO 

and  A,  F,  C,  D,  E,  F  are  similar  to  A,  B,  C,  D ,  E,  F,  respectively,  except  that  xO  is  replaced 
by  xO. 

Relationship  with  Motion  Flow  If  two  motion  solutions  are  to  be  disambiguated,  one 
of  the  two  inequalities  (22)  and  (23)  must  be  satisfied.  Conversely,  for  ambiguities  to  exist, 
both  of  the  following  must  be  true: 

xO e(y  -  yO)  (Ax2  +  Bxy  +  Cy2  +  Dx  +  Ey  +  F)  >0  (24) 

xO e(y  —  yO)  (Ax2  +  Bxy  +  Cy2  +  Dx  +  Ey  +  F)  >0  (25) 

The  same  equations  expressed  in  a  spherical  coordinate  system  have  been  obtained  in  [3] 
for  the  cases  where  optic  flow  values  can  give  rise  to  ambiguous  solutions  subject  to  the 
visibility  constraint.  Thus  requiring  D  to  be  positive  in  all  gradient  directions  yields  the 
same  constraint  as  that  obtained  from  requiring  the  same  motion  flow  subject  to  the  visibility 
constraint.  In  [3],  it  was  shown  that  for  a  half-sphere  or  equivalently  an  infinitely  large  image 
plane,  the  regions  where  (22)  and  (23)  hold  are  always  non-empty.  Therefore  the  conclusion 
is  that  if  we  consider  the  constraint  of  positive  depth,  the  full  motion  field  on  a  half-sphere 
uniquely  constrains  the  3D  motion  independently  of  the  scene  in  view.  Of  course  for  a 
practical  imaging  system,  we  do  not  have  an  infinitely  large  image  plane.  In  the  following  we 
study  the  geometry  of  the  areas  defined  by  (24)  and  (25)  in  order  to  investigate  the  potential 
confusion  that  exists  between  different  motions  for  image  planes  of  limited  size. 

Geometry  of  the  Negative  Upper  Bound  Areas  For  convenience  of  notation,  we  use 
/(x,y )  to  denote  Ax2  Bxy  +  Cy2  +  Dx  +  Ey  +  F ,  f{x,y)  to  denote  Ax2  +  Bxy  +  Cy2  +  Dx  + 
Ey  +  F,  and  g(y)  to  denote  -xO e(y  -  yO).  To  obtain  the  geometry  of  the  regions  prescribed 
by  (22)  and  (23),  we  first  make  the  following  observation:  the  equation  /(x,  y)  =  0  describes 
the  locus  on  the  image  plane  where  the  translational  flow  vectors  defined  by  (xO,  yO)  are 
parallel  to  the  rotational  flow  vectors  defined  by  (ae,/?e,7e),  as  can  be  readily  verified  by 
/(x,  y)  =  (x  —  xO,  y  —  yO)  •  (— uyTe, u*J.  In  [9],  this  locus  is  termed  the  zero  iso-motion  contour 
because  the  resultant  flow  vector  could  potentially  be  zero  due  to  cancellation  between  the 
translational  and  the  rotational  flow  vectors.  The  locus  describes  conics  on  the  image  plane, 
passing  through  the  FOE  (xO,  yO)  and  the  point  ^f)?  which  is  where  the  axis  of  rotation 
as  defined  by  (ae,/?e,7e)  pierces  the  image  plane.  It  separates  the  image  plane  into  regions 
where  /(x,y)  is  positive  or  negative.  Similar  arguments  apply  for  /(x,y),  except  that  the 
FOE  is  moved  to  (xO,  yO). 
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Similarly,  the  equation  g(y )  =  0  separates  the  image  plane  into  two  half-planes  where 
g(y)  is  positive  or  negative.  It  passes  through  the  FOE’s  of  both  the  real  and  the  estimated 
motion,  since  the  coordinate  system  is  chosen  such  that  yO  =  yO.  Thus  the  line  defines  the 
locus  of  points  where  the  real  translational  flow  is  parallel  to  the  estimated  translational 
flow. 


Figure  14:  The  geometry  of  the  negative  upper  bound  areas  on  the  image  plane  represented 
by  all  the  shaded  areas  in  the  figures.  These  are  also  the  areas  where  the  motion  fields  arising 
from  the  erroneous  motion  estimates  must  differ  from  the  true  motion  fields  or,  equivalently, 
there  exists  some  gradient  direction  along  which  the  normal  flows  must  be  different. 

Some  examples  of  regions  defined  by  the  two  conics  f(x,y),  f(x,y )  and  the  line  g(y) 
are  shown  in  Figure  14.  The  regions  where  (22)  or  (23)  holds,  that  is,  g(y)f(x,y)  >  0 
or  g(y)f(x,  y)  >  0,  are  represented  by  all  the  shaded  areas.  Depending  on  the  values  of 
(x0,y0),  (x0,t/0)  and  (ae,&,7e)>  the  curves  f(x,y)  =  0  and  f(x,y)  =  0  could  be  ellipses, 
hyperbolas,  parabolas,  or  one  of  the  degenerate  forms.  Figure  14  illustrates  the  cases  where 
the  conics  are  ellipses  and  hyperbolas.  The  limiting  case  where  an  infinitely  large  image 
plane  is  needed  for  uniqueness  of  the  solution  corresponds  to  the  case  where  f(x,y)  and 
/(x,  y)  degenerate  into  two  intersecting  lines;  the  constraints  (22)  and  (23)  then  simplify  to 
x  >  a,  where  a  may  tend  to  oo.  Such  cases,  to  be  amplified  in  Section  5.1,  pose  problems 
for  practical  imaging  systems  where  the  field  of  view  is  less  than  180°. 

5  Practical  Implications 

5.1  FOV  for  Accurate  FOE  Estimation 

In  the  preceding  section,  we  concluded  that  if  we  have  a  half-sphere,  or  equivalently  an 
infinitely  large  image  plane,  then  by  considering  the  visibility  constraint  and  the  full  motion 
field,  we  obtain  unique  3D  motion  independently  of  the  scene  in  view.  However,  a  more 
interesting  case  is  the  one  where  we  do  not  have  a  half-sphere;  this  is  a  problem  that  has 
been  of  much  concern  among  motion  researchers:  the  resultant  potential  confusion  that  exists 
between  translation  and  rotation.  Here,  we  use  (22)  and  (23)  to  determine  the  exact  field  of 
view  beyond  which  potential  confusion  can  be  theoretically  disambiguated,  independently  of 
the  scene  depth  distribution.  Consider  the  case  investigated  earlier,  namely,  W(xOe,  ye)  = 
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— A(/?e,  —  ae)  for  some  positive  A  or,  in  the  rotated  coordinate  system,  yOe  =  oe  =  ye  =  0 
and  >  0.  Then  it  can  be  shown  that  f(x,y )  =  0  and  /(x,y )  =  0  become  respectively 

i^x2  _  ^-xy  -  (3ef(y  -y0)  =  0 
x 2  -  —j j^-xy  -  0ef(y  -y0)  =  0 

If  y 0  is  non-zero,  these  are  equations  describing  hyperbolas.  As  yO  becomes  zero,  the  loci  of 
both  /(a:,  y)  =  0  and  /(x,  y)  =  0  degenerate  into  two  intersecting  straight  lines: 


T_/2. 

xO’ 

O 

II 

for  /(x, 

,y)  =  0 

(26) 

x_f. 

xO’ 

y  =  y0 

for  f(x, 

■  y)  =  o 

(27) 

Figure  15:  Diagram  illustrating  the  areas  where  the  the  motion  fields  arising  from  the 
erroneous  motion  estimates  must  differ  from  the  true  motion  field,  for  the  case  where 
W(xOe,ye )  =  —  \(0e,  —  ae)  for  some  positive  A.  The  diagrams  from  left  to  right  illustrate  the 
change  in  these  areas  for  increasingly  small  yO.  As  can  be  seen,  as  yO  tends  to  zero,  both 
f(x,  y)  =  0  and  f(x,y )  =  0  degenerate  into  two  pairs  of  intersecting  lines. 

Figure  15  shows  the  evolution  of  these  curves  as  yO  approaches  zero.  As  can  be  seen  in 
the  degenerate  case,  one  of  the  lines  f(x,y )  =  0  (and  f(x,y)  =  0)  becomes  coincident  with 
g( y)  =  0,  whereas  the  other  line  (the  vertical  line)  usually  moves  outside  the  image  plane  for  a 
typical  imaging  system.  Therefore  we  find  that  as  yO  approaches  zero,  the  area  on  the  image 
plane  where  the  two  motion  solutions  can  be  theoretically  disambiguated  (independently  of 
scene  depth)  becomes  smaller.  In  other  words,  the  errors  in  the  FOE  (x0e,  y0e)  are  most 
likely  to  be  parallel  to  (xO,  j/0),  so  that  along  the  direction  where  the  projection  of  (x0e,y0e) 
is  zero,  the  projection  of  (x0,t/0)  is  also  zero.  Consider  the  case  where  xO  is  also  zero.  Then 
one  of  the  lines  resulting  from  /(x,  y)  =  0  lies  at  infinity,  whereas  that  resulting  from  /(x,  y), 
if  it  were  to  approach  the  true  solution  (xO  — *  0),  would  also  lie  well  outside  the  image  plane. 
For  instance,  even  allowing  a  5°  error  in  xO,  we  still  require  a  field  of  view  of  170°  to  be  able 
to  “see”  the  line  x  =  In  other  words,  on  the  basis  of  equal  motion  flow,  together  with 
the  visibility  constraint,  if  the  FOE  is  at  (0, 0)  and  the  depths  of  the  scene  are  appropriately 
chosen,  then  all  xO  estimates  with  less  than  5°  error  are  indistinguishable  in  a  system  with 
a  field  of  view  of  170°. 
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Figure  16:  The  negative  distortion  areas  of  xO,  the  true  FOE  (represented  by  shaded  regions), 
and  its  neighbor  xO'  a  units  away  (represented  by  dotted  regions),  under  varying  values  of 
QJ.  f^r)  .  (■§?)  ,  and  (£)  represent  the  minimum,  the  maximum,  and  the  average 

V w  /  min  \ w  /  max  \w  /  mid  /  . 

scaled  depth  in  the  scene  respectively.  Our  criterion  (see  the  text)  requires  that  the  negative 
distortion  area  of  :r0  should  be  smaller  than  that  of  x§f . 


5.2  Inertial  Sensor  for  FOE  Estimation 

Vieville  and  Faugeras  [22]  describe  how  inertial  and  visual  cues  can  be  combined.  Angular 
velocity  information  is  provided  by  low-cost  gyrometers,  with  a  precision  of  0.04°/s.5  As 
analyzed  in  Section  3.1.1,  errors  in  the  rotational  estimates  lead  to  errors  in  the  FOE  esti¬ 
mates.  Thus  we  would  like  to  know  the  degree  of  precision  required  of  an  inertial  sensor,  so 
that  the  position  of  the  FOE  can  be  determined  within  some  small  bound  a.  In  this  section 
we  apply  the  iso-distortion  contours  to  this  problem. 

For  the  present  purpose,  we  content  ourselves  with  a  system  which  does  not  have  a  large 
FOV  and  is  constrained  in  such  a  way  that  there  is  no  rotation  about  the  Z-axis.  The  second 
condition  can  be  relaxed  if  the  rotation  about  the  Z-axis  can  be  accurately  determined  so 
that  7e  plays  little  role  in  the  following  analysis.  Furthermore,  we  do  not  consider  the  finite 
size  of  the  image  plane,  which  might  introduce  bias  due  to  the  position  of  the  FOE  in  the 
image,  and  we  assume  that  the  gradient  distribution  is  uniform,  that  is,  the  image  gradients 
are  uniformly  distributed  in  every  direction  and  at  every  depth  within  the  depth  range  of 
a  given  scene.  Now,  given  any  rotational  error  (ae:  (3e)  due  to  inaccuracies  in  the  inertial 

sThe  human  inertial  sensor  is  comparable  in  resolution  to  such  low-cost  sensors,  the  main  difference  being 
that  the  human  angular  sensor  is  an  angular  accelerometer  [21]. 


24 


sensor,  we  define  a  new  coordinate  system  such  that  the  new  x-axis  is  along  the  direction 
given  by  ( f3e ,  —  ae).  Henceforth,  xOe,yOe,ae,/3e  are  taken  with  respect  to  the  new  coordinate 
system,  in  other  words,  ae  is  zero. 

The  following  analysis  is  based  on  the  fact  that  along  any  gradient  direction  n,  the 
size  of  the  negative  distortion  region  always  decreases  at  first  as  the  FOE  estimate  begins  to 
deviate  from  the  true  FOE  (see  Figure  7).  However,  the  size  of  the  negative  distortion  region 
reaches  a  minimum  at  some  distance  from  the  true  FOE.  This  minimum  occurs  when  the 

error  (xOe,  yOe)  ■  n  is  such  that  — - _  is  equal  to  the  value  of  the  middle  of  the  depth 


(A/,o) 


n 


range  in  the  scene,  referred  to  as  .  in  Figure  16.  Equivalently,  the  D  —  1  contour, 

given  by  the  equation  Z  =  — .  e’  -7 - ,  passes  through  the  middle  of  the  depth  range  in 

{ref  >  0)  •  Tl 

the  scene  (see  Figure  7d).  Beyond  this  error,  the  negative  area  increases  monotonically  with 
the  size  of  the  FOE  error. 

Given  this  relationship,  what  is  the  FOE  error  (x0e,t/0e)  that  minimizes  the  negative 
areas  in  all  gradient  directions?  Since  ae  =  0,  if  we  also  set  y0e  =  0  and  x0e  =  —0ef  (if7)  -d> 
then  the  negative  distortion  areas  in  every  gradient  direction  will  be  minimized,  since  the 

fxO  Q'j  .  y  x 

depth  Z  given  by  Z  =  —7—^  - - occurs  at  (  Jh  for  every  n.  Thus,  the  overall  negative 

{PeJ,  0)  •  n  '  "  '  mid 

distortion  area  is  minimized.  To  have  the  desired  FOE  accuracy,  it  therefore  suffices  to  ensure 
that  the  FOE  error  x0e  given  above  is  well  within  the  required  error  bound  a.  In  other  words: 


^(#L 


<  o 


Thus  the  desired  degree  of  precision  needed  in  an  inertial  sensor  can  be  written  as  follows: 


I&I  < 


(28) 


If  angular  velocity  information  is  provided  by  a  low-cost  gyrometer  with  a  typical  pre¬ 
cision  of  0.04°/s,  then  (28)  imposes  a  constraint  on  the  type  of  scene  in  which  such  a  system 
can  operate.  Substituting  the  values  fie  =  0.04 °/s  (?s  0.0007  rad/s),  and  a  =  1°  of  visual 
arc,  say,  we  obtain  <  25  s-1.  A  robot  moving  in  an  indoor  environment,  with  a 

typical  walking  speed  of  1  to  2  m/s,  would  amply  satisfy  these  criteria.  Thus  it  would  seem 
that  such  a  low-cost  inertial  sensor  measures  up  well  to  the  task  of  accurate  FOE  estimation 
(here,  within  1°  of  visual  field)  in  an  indoor  environment. 


6  Experiments 

This  section  presents  experimental  results  on  synthetic  and  real  images.  First,  we  use  syn¬ 
thetic  images  to  determine  the  extent  to  which  a  low-cost  inertial  sensor  is  effective  for  FOE 
estimation.  The  conditions  of  the  experiments  were  as  follows: 

1)  The  observer’s  focus  of  expansion  was  located  at  xO  =  0,  y0  =  0.  The  field  of  view  was 
36°. 
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2)  The  world  features  were  placed  randomly  within  the  following  depth  ranges  for  each 
experiment:  1  <  Z  <  20, 1  <  Z  <  50, 10  <  Z  <  100.  The  forward  speed  W  was  1.8 
m/s  in  all  three  cases. 

3)  Noise  in  the  normal  flow  was  introduced  as  Gaussian  distributed  perturbations  of  the 
magnitude.  The  errors  ranged  from  0%  to  10%,  and  they  were  not  correlated  either 
spatially  or  temporally. 

4)  The  estimates  of  a,/?  and  7  had  errors  up  to  0.04°/s. 

5)  Four  FOE  candidates  were  tested,  with  y0e  —  0  and  a;0e  ranging  from  —1°  to  2°  of 
visual  field. 

Numbers  of  negative  depths  for  the  four  FOE  candidates  were  recorded  over  40  random 
trials,  and  the  results  were  represented  as  four  different  curves  in  each  of  the  three  graphs 
in  Figure  17.  As  can  be  seen  by  comparing  Figures  17a,  b,  and  c,  the  effect  of  increasing 
depth  is  to  gradually  degrade  the  efficacy  of  the  inertial  sensor  estimates.  Only  in  Figure  17a 
(indoor  scene)  is  the  correct  FOE  candidate  well  separated  from  the  other  candidates.  For 
images  with  greater  scene  distances,  the  local  nature  of  the  visibility  constraint  makes  the 
test  sensitive  to  the  distribution  of  features  and  the  effect  of  noise.  Thus,  the  results  support 
our  criteria  in  Section  5.2,  namely,  must  be  less  than  25  s-1. 

These  results  are  further  confirmed  by  experiments  performed  on  real  images  (Fig¬ 
ure  18a)  of  an  indoor  scene  with  objects  about  4-5  m  away.  Here  the  different  depth  ranges 
were  simulated  by  changing  the  forward  speed  W,  so  that  the  mean  scaled  depth  (^)niid 
was  different  in  each  case.  Estimates  of  the  rotational  velocities  were  again  obtained  from 
an  inertial  sensor  with  an  accuracy  of  0.04°/s.  Figures  18b,  c,  and  d  show  the  numbers  of 
negative  depths  obtained,  in  terms  of  level  curves,  as  the  FOE  estimates  move  away  from 
the  true  FOE  location,  each  figure  representing  a  varying  value  of  (^)mid-  The  “sinks” 
of  the  level  curves  represent  the  FOE  candidates  that  yield  the  minimum  number  of  depth 
values,  while  the  true  FOE  is  represented  by  the  cross.  It  can  be  seen  that  the  errors  become 
significant  when  (]f')mid  increases. 

7  Conclusions  and  Future  Directions 

The  extraction  of  3D  motion  and  shape  from  a  sequence  of  images  represents  an  important 
problem  in  computational  vision  that  has  attracted  much  attention.  When  an  estimate  of 
3D  motion  is  available,  it  can  be  used  with  image  motion  measurements  to  estimate  the 
structure  (relative  depth)  of  the  scene  in  view.  In  this  paper  we  have  shown  that  when  an 
error  exists  in  the  3D  motion  estimate,  the  computed  structure  of  the  scene  is  distorted,  and 
we  have  characterized  this  distortion  by  the  iso-distortion  loci  in  space.  The  distortion  of  the 
depths  of  scene  points  depends  not  only  on  the  error  involved  in  the  3D  motion,  but  also  on 
the  image  direction  along  which  motion  measurements  are  made.  This  result  calls  for  a  re- 
evaluation  of  scene  reconstruction  algorithms  based  on  multiple  views.  The  structure  of  the 
distortion  space  has  also  allowed  us  to  present  a  number  of  geometric  arguments  regarding 
the  inherent  ambiguity  in  image  sequences  as  far  as  3D  motion  estimation  is  concerned. 
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(a)  1  <  Z  <  20 


(c)  10  <  Z  <  100 

Figure  17:  Number  of  negative  depths 


(b)  1  <  Z  <  50 

Field  of  view:  36° 
Forward  speed:  1.8 m/s 
Error  in  /?:  0.04°/s 
Noise:  10% 

++  x0e  =  —  1° 

Z0e  =  0° 

-  -  -  x0e  =  1° 

- x0e  =  2° 


over  40  trials  for  different  depth  ranges. 


The  framework  introduced  here  has  also  been  used  in  the  study  of  other  problems  related 
to  the  perception  of  shape  and  independent  motion.  In  [11]  it  was  used  to  explain  the 
psychophysics  of  the  distortion  of  visual  space  experienced  by  human  observers  from  stereo 
or  motion.  In  [4]  algorithms  were  developed  for  independent  motion  detection  by  a  moving 
observer  that  exploit  the  relationship  among  the  distortion  spaces  in  the  time  domain. 
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Appendix  A  Demanding  that  D  be  positive  in  all  gradient  directions  is  equiv¬ 
alent  to  demanding  equal  motion  flow 

The  fact  that  requiring  D  to  be  positive  in  all  gradient  directions  yields  the  same  constraint 
as  requiring  equal  motion  flow  can  be  proved  as  follows.  Denote  by  ut  and  uT  the  actual 
translational  and  rotational  flow,  u\  the  unit  vector  emanating  from  the  estimated  FOE, 
and  u'r  the  estimated  rotational  flow.  Requiring  D  to  be  positive  in  all  gradient  directions 
Oi  means  that  for  all  0,  between  0°  and  360°,  there  exists  a  positive  A;  such  that 


[u'r  +  A iu't  )  •  (cos  9i ,  sin  0;)  =  (ur  +  ut  )  •  (cos  0,-,  sin  Oi) 


Hence  for  all  Oi 


A, 


(«t  +  (ur  -«'))•  (cos 0*,sin Oi) 


u't  ■  (cos  0;,  sin  Oi) 


>  0 


This  can  only  be  true  if  the  vectors  ( ut  +(ur-u'r))  and  u\  are  in  the  same  direction.  That 
is,  there  exists  some  positive  A  such  that 

u'r  +  Ait'  =  ur  +  ut 

Thus  we  have  demanded  that  the  two  motion  fields  be  the  same. 
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